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Abstract 



We study the kinetics of protein folding via statistical energy landscape 
theory. We concentrate on the local-connectivity case, where the con- 
figurational changes can only occur among neighboring states, with the 
folding progress described in terms of an order parameter given by the 
fraction of native conformations. The non-Markovian diffusion dynam- 
ics is analyzed in detail and an expression for the mean first-passage 
time (MFPT) from non-native unfolded states to native folded state is 
obtained. It was found that the MFPT has a V-shaped dependence on 
the temperature. We also find that the MFPT is shortened as one in- 
creases the gap between the energy of the native and average non-native 
folded states relative to the fluctuations of the energy landscape. The 
second- and higher-order moments are studied to infer the first-passage 
time (FPT) distribution. At high temperature, the distribution becomes 
close to a Poisson distribution, while at low temperatures the distribu- 
tion becomes a Levy-like distribution with power-law tails, indicating a 
non-self-averaging intermittent behavior of folding dynamics. We note 
the likely relevance of this result to single-molecule dynamics experi- 
ments, where a power law (Levy) distribution of the relaxation time of 
the underlined protein energy landscape is observed. 
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I. INTRODUCTION 



The study of diffusion along a statistical energy landscape is a very important issue for 
many fields. In the field of protein folding, the crucial question is how the many possible 
configurations of polypeptide chain dynamically converge to one particular folded state [[!]. 
Clearly, a statistical description is needed for a large number of configurational states. Re- 
cently, a new view of protein folding based on energy-landscape theory was developed @-§| • 
In this theory, there exists a global bias of the energy landscape towards the folded state due 
to natural evolution selection. Superimposed on this is the fluctuation or the roughness of 
the energy landscape coming from conflicting interactions of the amino acid residues. The 
resulting protein-folding energy landscape is like a funnel. The funnel picture of folding is 
currently in agreement with experiments ]7| and consistent with both lattice and off-lattice 
simulations P~PH. 

It is very important to discuss the dynamics of folding and the nature of the pathways on 
the funnel-like energy landscape. Initially, there are multiple routes towards native folded 
states. Due to the competition between roughness of the landscape and entropy, down in 
the funnel, there may exist local glassy traps (minima). Then discrete pathways leading to 
folding emerge. It is crucial to determine the influence of the global bias towards folded 
state on the actual folding process itself. Following the study of the thermodynamics of the 
folding-energy landscape, the kinetics of folding along the order parameter that represents 
the progress of folding towards the native state can be discussed. Although in the multi- 
dimensional state space, the states are all locally connected, the singled-out order parameter 
(for example, p, the fraction of native configurations ,or q, the fraction of native contacts) 
which represents how close structurally the protein is towards its native state, may or may 
not have local connectivity. When the kinetic process is fast, either because of the large 
thermodynamic driving force or because the process is activationless, the native state can 
either be reached in one shot or through some intermediates that are formed very rapidly 
( the unravelling from those intermediate states or traps is often needed to reach the final 
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native state). This is the case in which the states in order-parameter space can move 



globally from one to another in an essentially discontinuous way ||TT|. On the other hand, if 
the kinetics are slow due to the nature of the activation folding process, then in general, the 
states are locally connected in order-parameter space. The dynamic process can be studied 
by a kinetic master equation, and in the local connectivity limit, the kinetic equation reduces 
to a diffusion equation. 

When the energy landscape is smooth, the average diffusion time is a good parameter 
for characterizing the dynamical process. On the other hand, when the energy landscape is 
rough, there exist large fluctuations of the energies, and the diffusion time is expected to 
fluctuate very much around its mean. In that case the average diffusion time is no longer 
a good parameter to characterize the dynamics. One needs to know the full distribution of 
the diffusion times in characterizing the folding process. 

It is worth mentioning that the aforementioned problem has many potentially important 
applications in other fields too. For example, in considering glasses, viscous liquids, and spin 



glasses |TB[ , the distribution of barriers, and therefore the distribution of diffusion times, is 
crucial in understanding the kinetics near the trapping or glass transition temperature, where 
the energy landscape becomes rough. 



In the field of protein dynamics, pioneering work by Frauenfelder et al. |14| has shown 
experimentally that the energy landscape is complex through the study of CO rebinding to 
Mb after photo dissociation. Their work reveals evidence for conformational substates that 
are organized in a hierarchical fashion. The study of dynamics along this energy landscape is 
very important in understanding the rebinding kinetics as well as the underlying structure of 
the landscape. This would help to find the interrelationships between structures, function, 
and dynamics of proteins ]T3|. At low temperatures, the energy landscape becomes rough, 



with the fluctuation of the landscape causing non-self-averaging behavior in diffusion time. 
Here an understanding of the distribution of the barriers, and therefore the diffusion-time 
distribution, is necessary in order to adequately characterize the kinetics of the whole system. 
The average diffusion time along the folding funnel has been studied both analytically 



TT| . |T6|JT^| and numerically jHH§. In this paper, we obtain results for the MFPT (mean 



first-passage time) as a function of the temperature, the energy gap Se, and roughness Ae 
of the folding landscape by solving a random-energy based model ||. The MFPT is found 
to be shorter when the ratio 5e/Ae is increased. We investigate the fluctuation or variance 
and the higher-order moments of FPT (first-passage time) as well as the FPT distribution 
function. Above a kinetic transition temperature To, the FPT is well behaved, and the 
distribution tends to be Poissonian. On the other hand, as the temperature decreases, 
the distribution of FPT starts to become broader around and below To. FPT distribution 
develops a power-law tail, approaching a Levy distribution, extending over large scale of 
time. The non-self-averaging behavior of the kinetics gradually dominates. Within this 
temperature regime the behavior of the distribution function is often needed to adequately 
capture the whole kinetics. By solving a corresponding unfolding process we also obtain 
a folding transition temperature Tf, defined by the point where the folding and unfolding 
curves intersect. The results show strong correlations between Tf and T . 

It is worth pointing out that the reactions and activated barrier crossings are stochastic 
events. The laws of chemical kinetics are merely statistical laws describing the average 
behavior of populations. It is now possible to measure the reaction dynamics of individual 
molecules in the laboratory |l7|,r8fl . This opens the way for the statistics of reaction events 



to be directly tested. When the traditional phenomenology based on simple rate laws is valid 
for large populations, experiments on individual molecules or small numbers of them should 
give simple statistics. Generally, on complex energy landscapes such as biomolecules, the 
populations often do not obey simple exponential decay laws, and the activation processes 
often do not follow the simple Arrhenius law. The study of the statistics of individual 
molecular reaction events can greatly clarify these more subtle reaction processes. 

At the level of large populations, the barrier-crossing picture has been shown to often 
provide an adequate characterization of the non-exponential kinetics usually seen in complex 
systems. The dynamics of the barrier crossing on a fluctuating energy landscape itself clearly 
leads to non-Poissonian statistics for individual molecules. The environmental fluctuations 



lead to "intermittency" [§,|T^]. The intermittency reflects the fact that certain relatively rare 
configurations of the environments are the ones that most favor the reaction. This means 
that the distribution develops fatty tails. The dynamics and folding of proteins of single 
molecules also exhibits similar behavior p0| . In particular, the recent experiments carried 
out on single-molecule enzymatic dynamics showed this intermittency phenomena and 
the distribution of reaction time of underline energy landscape of proteins approached Levy- 
like distribution (or power-law decay). 

We organize the paper as follows: We first establish the theoretical foundations and give 
detail steps in studying folding diffusion dynamics in the Theory section. Then in the section 
of Results and Discussions we give a thorough discussions on the results of both MFPT and 
distributions of FPT. The connection between our theoretical results yielding a Levy-like 
distribution of FPT and protein-dynamics experiments is also discussed. At the end, we 
give the conclusions. An appendix is added to give further details of the derivation of the 
diffusion equation. 



II. THEORY 

To describe the folding process, it is often convenient to define an order parameter that 
characterizes the degree of folding. In this paper, we use the fraction of native conforma- 
tions or native bond angles as the order parameter p. Furthermore, we assume the local 
connectivity condition here, assuming that the dynamics changes continuously with p. 

The model we study here was introduced earlier 0,0] • The problem of protein folding 
dynamics can be illustrated as random walks on a rough energy landscape. In this model, 
the energy landscape is generated by a random energy model ||22|| , which assumes that 
the energies of non-native states and their interactions are random variables with given 
probability distributions. In this model polypeptide chain, there are N residues, and for 
each residue there are v + 1 allowed conformational states. A simplified version of the 
Hamiltonian or the protein energy function is: 
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H = -^ej(tti) - Ji,i+i(ai,a i+1 ) - ^ K itj (a i: atj) 



(1) 



where the summation indices i and j are labels for the residues in a polypeptide chain, and 
represents the conformational state of the ith residue. The first term represents the one-body 
potential. The second term represents the interactions of nearest neighbors in sequence, and 
the last term represents the two-body interactions of amino acids distant in sequence but 
close in space. In this random energy model, the energy terms q, J^+i, and for native 
states are fixed to be eo, Jo, and K , respectively, whereas for non-native states they are 
generated by independent random variables. For simplicity the probability distributions of 
these random variables are assumed to be Gaussians with means e, J, K and widths Ae, A J, 
AK separately. Therefore the probability distribution function P(E, N ) =< 5(E — H) > 
for a protein with total energy E and N native amino acids (therefore p = N /N) is also a 
Gaussian with mean 



E(N ) = -N 
= -N 



N, 



N \N J V N J \ \N 



pe + p 2 L + (1 -p)e+ (l-p 2 )L 



(2) 



and width 



AE(N ) = \ N 



1^ 1/2 



{N [(l-p)A e 2 + (l- P 2 )AL 2 ]} 1/2 , 



(3) 



where L = J + zK , L = J + zK, and AL 2 = A J 2 + zAK 2 . z is the average number 
of neighbors surrounding a residue distant in sequence. Since the spatial collapse is usually 
fast compared with the rest of the folding process, this dynamical effect is ignored here 
for simplicity and it is assumed here that z is a constant throughout the folding process. 
By this random-energy construction one can easily generate energy surfaces with roughness 
controlled by Ae and AL and global bias determined by 5e = e — eo and 5L = L — L . 
Further simplification is made by the assumption that different protein conformational states 
are uncorrelated. This independence assumption also provides a way of introducing large 
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fluctuations onto the energy landscape. It is also likely that by applying this assumption 
one has already brought in some ingredients of cooperativity. With this approximation the 
average free energy and other thermodynamic quantities can be derived through the use of 
the microcanonical ensemble analysis. The result is F(p) = E' — TS, where T is a scaled 
temperature, 



E' 



-N 



Ae 2 + AL 2 
T 



Se- — I p+ [5L 



T 



T 



P 



(4) 



is the energy function of the system and 



S 



-N 



plogp + (1 - p) log 



l-p\ Ae 2 (l-p) + AL 2 (l-p 2 ) 

2T 2 



(5) 



is the entropy of the system. 

The kinetic folding process is approximated by the Metropolis dynamics: 

<E2 ~ El) ] for E 2 > Ek 

R{E\ — > E 2 ) 

for E 2 < Ei. 




(6) 



where R(Ei — > E 2 ) represents the transition rate for a single polypeptide chain from state 
1 to 2 with total energies E\ and E 2 , respectively. Rq is a overall constant describing the 
inverse time scale for the transition process between configurations (usually Rq has the order 
of inverse nanoseconds). Therefore the transition rate from one conformational state to a 
neighboring state is determined by the energy difference of these two states. Further analytic 
treatment to this problem is made by utilizing the continuous time random walk (CTRW) 
23| . By this construction one is able to reduce the multi-dimensional random walk problem 



to a one-dimensional CTRW, resulting in a generalized master equation. Schematically, one 
can first categorize the energy landscape by the order parameter p, along which an energy 
distribution function P(E,p) is given (which is a Gaussian with mean and width shown in 
Eqs. (2) and (3)). With the use of Metropolis dynamics one can calculate the associated 
transition rate distribution function P(R,p), specifying the jumping rate R for a molecule 
at a state with order parameter p to its neighboring states, and therefore obtain the corre- 
sponding waiting-time distribution \I/(t, p) for a molecule to stay at a conformational state 



for time r before it leaves. A CTRW can be constructed by knowing both the waiting-time 
distribution for the system and the jumping probabilities between successive p's. The latter 
is approximated to be time-independent, which is equivalent to the quasi-equilibrium as- 
sumption. By this assumption one can calculate these probabilities utilizing the asymptotic 
distribution: 

Urn G(p,r) oc e -^W, (7) 

where G(p, r) is the probability distribution function for the polypeptide chain at time r, 
and P=l/T. 

A generalized kinetic master equation can thus be derived using the CTRW approxima- 
tion and conveniently represented in the Laplace-transformed space as: 

sG(p,s) -n (p) =JC(p,s)G(p, s) (8) 

where n (p) is the initial distribution of G(p,r) and K,(p,s) is a linear operator which is 
related to both the waiting time distribution and the jumping probabilities. In the local 
connectivity case the generalized master equation is reduced to a generalized (By generalized, 
we mean instead of the usual Fokker-Planck equation where diffusion is a constant in time 
representing a typical kinetic Markovian behavior, here we obtain a non-Markovian diffusion 
kernel in time due to the dimensional reduction from multiple one to a single p) Fokker- 
Planck equation in the Laplace-transformed space: 



d f 

sG(p, s) - n (p) = — lD(p, s) 



G(p,s)j-U(p,s) + j-G(p,s) 



(9) 



where 



s is the Laplace transform variable over time r, and D(p,s) is the frequency-dependent 
diffusion parameter. F(p) is the average free energy derived from the random energy model. 
The explicit expression for D(p, s) is given in the appendix, s, which has an unit of inverse 



time, is the Laplace transform variable over time r. G(p, s) is the Laplace transform of 
G(p,r), which is the probability density function such that G(p,r)dp is the probability for 
a protein to stay between p and p + dp at time r. Here no(p) is the initial condition for 
G(p,r). 

The boundary conditions for the generalized Fokker-Planck equation are set as a reflect- 
ing one at p = 0, where all the residues are in their non-native states: 



G(p,s)j-U(p,s) + j- p G(p,s) 



0, 



p=0 



and an absorbing one at p — pf, where most of the residues are in the native states: 

G( Pf ,s)=0. 

The choice of an absorbing boundary condition at p = pj facilitates our calculation for the 
first-passage time distribution. 

Alternatively, one can rewrite this generalized Fokker-Planck equation in its integral 
equation form by integrating it twice over p: 



[Pf [P r ~ 

G(p, s) = - J dp dp' [sG(p , s) - n (p ) 

= - ( Pi dp' f dp" \sG{p", s) - n {p" 
Jp Jo 1 



exp [U(p',s)-U(p,s)] 
D(p',s) 
K(p') 



D(p,s)K(p)> 



(11) 



where 



K{ P ) 



(12) 



D( P ,oy 

In this paper, the first-passage time (FPT) to reach pj (that is, the time required for the 
random walker to visit order parameter value pf for the first time) is used as a typical or 
representative time scale for folding. One has the following relation for the FPT distribution 
function Pfpt{j) '■ 



Pfpt(t) = - (1 _ E) = -f T 



(13) 



where 



S(r)= f Pf dpG(p,r). 
Jo 



(14) 



The moments of the FPT distribution function are calculated from the following relation: 

k dE(r) 



roc roc 

(r n )= / drr n P FPT (r) = - drr* 
Jo Jo 



dr 



roo rpf roo 

n dTT n - 1 Z(r)=n dp drr n ' l G(p,T) 
Jo Jo Jo 



n-1 



s=0 



(15) 



If we make a series expansion of G(p, s) and l/D(p,s): 



and 



s) = G (p) + sGi(p) + s 2 G 2 {p) + 



l/.D(p, s) = a (p) + sai(p) + s 2 a 2 (p) + 



(16) 



(17) 



then we have 



and Eq. (9) becomes 



(r») = nK-l)"" 1 I"' dpG n ^(p), 
Jo 



G (p)= I" 1 ' dp' [ P dp"n (p")a (p)K(p,p') 
J p Jo 



(18) 



(19) 



and for n > 



rpt rp" 

G n+1 (p) = - dp' dp" 
J p Jo 



G n -j(p")aj(p) - n (p")a n+1 (p) 

3=0 



K(p,p') (20) 



by matching each coefficient of s n in Eq. (16) and Eq. (17). Therefore one can calculate 
G n (p) recursively. Mean while, one can also solve the integral equation (11) directly for 
G(p, s), and by the observation that 



Pfpt(s) = 1 - sE(s), 



(21) 



where Pfpt(s) and S(s) are Laplace transforms of Pfpt{ t ) and S(r), respectively, we can 
investigate Pfpt{t) by studying the behavior of Pfpt(s). To solve Eq. (11) numerically, one 
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first replaces the integrations by discrete summations (recalling that originally this model is 
itself constructed in a discrete order-parameter space with equal spacings). Because Eq. (11) 
is linear in G(p, s), one can solve for G(p, s) in the discrete p space by a matrix inversion 
technique. In the discrete version Eq. (11) becomes 

Gi = - ^(ApfK^D^I^Kj.^sG, - rf 0fe ), (22) 

where the integral operators become matrices 

hij = 
hij = 

i, j, and k are discrete labels of the order parameter. K and D are diagonal matrices with 
nonzero elements K(pi) and D(pi, s), respectively. G and n are vectors of elements G(pi, s) 
and no(pi)- With these notations one can easily get 

G = (DK + shKhy'hKhno (24) 

In our calculations we make the matrix inversion with a Gaussian elimination method with 
scaled-column pivoting, and the results don't change very wildly with the number of grids 
we choose in the discrete space, so we conclude this is a stable procedure for solving G(p, s) 
and therefore P FPT (s). 

In summary, one first defines an order parameter p describing the direction of the folding 
process. Then one assigns the distribution of energy states P(E,p) along each p. The 
next step is to specify the network, i.e., the geometry of the energy landscape. In the 
present model this is done by random connections between successive p. Finally, in order 
to simplify the complex multi-dimensional random walk problem into an one-dimensional 
problem, one utilizes the approach of continuous time random walk (CTRW) and squeezes 
the vast information on the energy landscape into the basic two requirements for constructing 
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1 if i > j, 

otherwise. 

1 if i < j, 

(23) 

otherwise. 



a CTRW: a waiting-time distribution function and jumping probabilities among different p's. 
From the CTRW one gets a generalized master equation, and in the local-connecting case 
it simplifies into a generalized Fokker-Planck equation (Eq. (9)). 

One should note that the original Hamiltonian (Eq. (1)) does not suffice in building the 
whole energy landscape. In the approach used here it is supplemented by the assumption of 
independence among different energy states (The effects of correlations have been discussed 
elsewhere [p4H.). The resulting energy landscape has many features which may resemble 
real polypeptide chains. First, there exists local energy traps, some of which are very 
deep. Second, near the folded state the energy fluctuations become much smaller, and 
diffusions close to the folded state are fast (see next section). This implies that the folded 
state may have some degrees of flexibility, allowing certain occupation of substates and 
conformational changes. Finally, the random connecting network (energy landscape) builds 
a strongly interacting environment in which some effects of the cooperativity may have 
already been reflected. The current approach is a phenomenological one which captures 
the physics on the coarse-grained or renormalized level. This is in analogy to the Landau- 
Ginzburg approach for treating critical phenomena. A microscopic Hamiltonian that is both 
computationally manageable and quantitatively faithful to real folding is currently still not 
available. The definition of the order parameter p needs to be looked at more carefully, since 
there might be ambiguities in defining the fraction of native conformations in real proteins. 
However, parameters like Se and Ae appear to characterize the bias and the fluctuation on 
the energy landscape faithfully. Hence, one can hope to build up a model which has several 
important features resembling real proteins in a semiquantitative way, and it is interesting 
to see how far one can go with this approach. 



III. RESULTS AND DISCUSSIONS 

We start the numerical calculations of moments and distributions of FPT by setting 
Rq = 10 9 s _1 , N = 100 and v = 10. For simplicity we assume 6e = 6L and Ae = AL. The 
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ratio of the energy gap between the native state and the average non-native states versus 
the fluctuations of the non-native states, 5e/Ae, which represents the relative degree of the 
funnel slope or the bias towards the native state compared with the local roughness or the 
traps of the folding landscape, becomes an adequate parameter for this model. We set the 
initial distribution of the protein molecules to be n (p) = 8(p — p^, where pi is set to be 
0.05. In our calculations we set pf = 0.9, which means that 90 percent of the amino acids 
are in their native states. We calculate the moments (r n ) of the FPT distribution function 
by utilizing Eqs. (11)-(21). 

Fig. |l| shows an example of D(p,s = 0) for various settings of Ae/T. One can see clearly 
that the diffusion is very fast near pf. This means that usually the conformation changes 
near the folded state don't play a major role in the folding time. It also indicates the 
flexibility in conformational changes near the folded state. At higher temperature (hence 
smaller Ae/T) the diffusion parameter for different p's has less distinction, whereas at low 
temperature the diffusion process varies in large orders of magnitude with p. 

The MFPT (r) for the folding process versus different scaled inverse temperatures are 
plotted in Fig. [2] for various settings of the parameter 8e/Ae. Note that the vertical axis is in 
the logarithmic scale. We have a letter V curve for each fixed Se/Ae. At high temperature, 
the MFPT is large although the diffusion process itself is fast (i.e., D(p,s) is large). This 
long-time folding behavior is due to the instability of the folded state. The MFPT is also 
large at low temperature. This is due to the importance of the onset of the low energy 
non-native trapped states. 

The MFPT reaches its minimum at a transition temperature T . By comparing this 
minimum for various values of <5e/Ae, we find that the minimum of MFPT gets smaller 
by increasing the ratio of energy bias versus roughness (see Fig. |3|). This indicates that 
a possible criterion to select the subset (subspace) of the whole sequence space leading to 
well designed fast folding protein is the maximization of 5e/Ae. In other words, one has 
to choose the sequence subspace such that the global bias overwhelms the roughness of the 
energy landscape |25| . 
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By comparing the MFPT curves for various set of 5e/ Ae, we found that when T < To, the 
MFPT is only dependent on Ae/T. This indicates that the roughness condition dominates 
in the MFPT for this temperature regime. On the other hand, when T > To, the MFPT 
is mainly dependent on 5e/T and has less dependence on Ae/T. This suggests that the 
transition at T = T marks the crossover between competing contributions due to Ae/T and 
Se/T. 

We also calculate the higher moments of the FPT distribution. In Fig. § we present 
the results for (r 2 ) as an illustrated example. Again we have a V shape curve for each 
setting of <5e/Ae with the minimum of the curve having a temperature close to To. Similar 
to the behavior of (r), at low temperature (r 2 ) is only dependent on Ae/T, and in the high 
temperature regime it is mainly dependent on Se/T. For higher moments we also obtain the 
same conclusion. 

In Fig. [5| we show the behavior of the reduced second moment, (t 2 )/ (t) 2 . We find that 
the reduced second moment starts to diverge at temperature around T , where MFPT is 
at its minimum. The degree of divergence increases rapidly as temperature drops below 
Tq. This indicates that the average is not a good representative of the system any more 
and a long tail in the FPT distribution is developed. The intermittency where rare events 
make great contribution occurs. The divergence of the second moment also shows that the 
dynamics is exhibiting non-self-averaging behavior. 

^From the study of higher moments, we find at high temperature the relationship 
(r n ) = n\(r) n . Therefore the FPT distribution function is Poissonian in the high tem- 
perature regime. But when T < T , it is hard to get more information from the moments 
because of their diverging behavior. On the other hand, the folding dynamics can be also 
studied by solving the linear integral equation (11) directly by making the inversion of the 
linear operator. We investigate the behavior of the FPT distribution function in the Laplace- 
transformed space derived via Eqs. (12) and (19). Our result shows that for T < To Pfpt{s) 
decays slowly over decades, which suggests that the usual numerical Laplace inversion tech- 
niques cannot be applied. Moreover, if we plot log(— log P(s)) versus logs, we find that 
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there is approximately a linear relation over several orders of magnitude in s (see Fig. |6| 
for example). This indicates that for T < T P FPT (s) can be approximated by a stretched 
exponential: 

Pfpt(s) « e~ csa , (25) 

which is the Laplace transform of the Levy distribution in the time space. So we have 

1 ~ (-c) n T(an + 1) . 

Pfpt(t) « > - — tt^ sm(7rcm) . (26) 

7r^T aB + 1 r(n+l) 

a lies between and 1. From the asymptotic property of the Levy distribution function we 
learn that Pfpt(j) ~ r~( 1+a ) for large r. In Fig. |7| we make a plot of the exponent a versus 
Ae/T for the case 5e/Ae = 4.0. We find that a decreases when the temperature decreases. 

From the results above, we find that for a fixed energy landscape, there exists a dynamic 
transition temperature To. When the temperature is lager than To, the FPT distribution 
is Possionian, which means exponential kinetics. In this case the the underlined energy 
landscape of the polypeptide chain is rather smooth, and therefore the folding process just 
follows the valleys and saddle points on the landscape with minor fluctuations. On the other 
hand, when the temperature is below T , the second and higher reduced FPT moments 
diverge, and the FPT distribution exhibits a power-law decay behavior. This indicates that 
the folding process is non-self-averaging and achieved through numerous timescales. In this 
case the underlined energy landscape of the polypeptide chain is rather rough. Nearby 
folding paths on the energy landscape may have big differences in their energy barriers. 
In Fig. |] we make a "phase diagram" showing this dynamical character. We found that 
for a fixed Se/Ae (which corresponds to a straight line through the origin in the phase 
diagram), when the temperature is lowered, the system goes through a transition from 
self-averaging to non-self-averaging behavior. Furthermore, for a fixed Se/T, the dynamics 
becomes non-self-averaging after we increase Ae/T over some critical value. This can be 
accounted intuitively since when one increases Ae/T, roughness on the energy landscape 
becomes more prominent, and different folding paths will lead to very distinct folding times. 
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It is less intuitive however to see that if we fix the control parameter Ae/T and increase 
Se/T, the dynamics also becomes non-self-averaging. This can be understood by viewing 
that the MFPT is decreasing if we keep increasing Se/T until a transition point at which 
it losses its dominance. When Se/T is well below its transition point, the dynamics is 
primarily dominated by the effects of the relative bias (Se/T) and is therefore self-averaging. 
When Se/T is larger beyond this transition point, the FPT distribution depends mainly on 
the relative roughness (Ae/T) and the dynamics becomes non-self- averaging. The results 
will not be much different if we keep on increasing Se/T. Finally, from our results, one 
sees that the non-self-averaging phenomena do not necessarily indicate slow dynamics. For 
example, when one fixes Ae/T and increases Se/T, the dynamics eventually becomes non- 
self- averaging. However, the MFPT keeps decreasing until it reaches the transition point, 
after which the MFPT keeps approximately constant. 

For comparison, we also calculate the folding transition temperature Tf by solving the 
corresponding unfolding problem. Specifically, we start from pi = 0.9 and calculate the 
first-passage time to pf = 0.1. The unfolding MFPT curve intersects the folding MFPT one 
on its high-temperature branch. Since one expects the inverse of MFPT to characterize the 
rate of the kinetic process, this point of intersection can reasonably be used to locate the 
folding transition, where the kinetic rates of the folding and unfolding processes are equal. 
The comparison of this folding transition temperature Tf with the dynamical transition 
temperature T is listed in Table 1. We find that in all our examples T is strongly correlated 



with Tj, which is in agreement with the experimental facts and simulation results p6 . 

As discussed above, this dynamical transition results from the competition between dif- 
ferent controlling parameters. In our calculations, we only have two such terms: Se and Ae, 
and the result is a sharp crossover around T . One should note that in general Se ^ SL and 
Ae ^ AL. Furthermore, one has to take into consideration the three- and higher-body inter- 
actions among amino acid residues, which are also necessary for accounting the cooperative 
behavior in folding dynamics. In reality one would expect the sharp crossover transition to 
be smoothed out if one take into account a more comprehensive set of controlling parameters. 
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In single-molecule folding experiments, it is now possible to measure not only the mean 
but also the fluctuations and moments as well as the distribution of folding times ||20|| . In 
different experimental and sequence conditions, one can see different behaviors of the folding 
time and its distributions. A well designed fast-folding sequence with suitable experimental 
condition exhibits self averaging and simple rate behavior. Multiple routes are parallel and 
lead to folding. A less well designed sequence will not only fold slowly but also often exhibit 
non-self-averaging nonexponential rate behavior, indicating the existence of intermediate 
states and local traps. In this case, discrete paths to the folded state emerge. The folding 
process is sensitive to which kinetic path it takes. Certain specific kinetic paths give the 
crucial contribution to folding (rare events contribute most, indicating intermittency) . One 
can use single-molecule experiments to uniquely determine the fundamental mechanisms and 
intrinsic features of the protein folding. In typical bulk experiments with large populations, 
it is very hard to distinguish whether the non-exponential behavior is intrinsic for each 
individual molecule or just the result of inhomogeneous averaging of the large sample of 
molecules with each individual molecule exhibiting single exponential behavior. 

It is worth mentioning that in this paper although we focus on the study of the protein 
folding problem, the approach we use is generally applicable to problems with barrier crossing 
on a multi-dimensional complex energy landscape. The main ingredient for this model is 
Brownian motion on a rough mult i- dimensional landscape, or equivalently, a random walk 
on a complex network with frustrated (highly fluctuating) environment. Since this is quite 
general and universal, we expect our results may also be able to account for a large class 



of phenomena. In fact the experiments on glasses, spin glasses, viscous liquids |L3| and 
conformational dynamics already show the existence of non-exponential distribution at low 
temperatures. In particular, a recent experiment on single- molecule enzymatic dynamics |21[ 



shows explicitly the Levy like distribution of the fluctuation time for the underlined complex 
protein energy landscape. A theoretical investigation of the complex-system dynamics also 
showing the Levy like behavior using fractal diffusion approach has recently been carried 
out [|27|. Our approach provides a "microscopic" foundation and rationale for the fractal 
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model. 

IV. CONCLUSION 

In this report, we have studied the kinetics of folding along a locally-connected order- 
parameter path. We used a generalized Fokker-Planck diffusion equation for describing the 
folding dynamics. We further solved the equation and obtained the expression for the MFPT. 
The effects of a stability gap 5e, temperature, and fluctuations of the folding landscape on 
FPT were discussed. We found that the MFPT is smaller when the ratio of the energy gap 
between the native and average non-native states versus the fluctuations of the landscape, 
Se/Ae is larger. 

The fluctuations and higher-order moments were also studied. It was found that for 
temperatures well above T , the folding process is self- averaging and its FPT distribution 
obeys a Poisson distribution. But when the temperature is lower, the fluctuations start to 
diverge. This means that the actual folding process may happen on multiple time scales, 
and the non-self-averaging behavior emerges. In this case, the full distribution of the FPT 
is required in order to characterize the system. From our analysis the distribution of FPT 
turns out to be close to a Levy distribution, which has a power-law tail for long time. One 
expects to be able to see this kind of fluctuations in single-molecule experiments. In the 
bulk measurements the average fluctuations are reduced due to the central limit theorem. 
Further more one cannot tell if the fluctuations are either from the intrinsic properties of 
the molecules or the inhomogeneous averages over molecules. Our analytical results of the 
MFPT and its self- or non-self- averaging behavior may provide a possible kinetic basis for 
the criterion in selecting a subset of the whole sequence space to reach well-designed and 
fast-folding proteins. 

By carefully reexamining this model one finds that there are still open questions for this 
whole set of constructions. First, it is not clear whether the order parameter p is the best 
reaction coordinate for folding. Second, correlations between different states are ignored. 
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Third, in the model it is assumed that each conformational state has Nv + 1 neighboring 
states, and the number of total conformal states is equal to (y + 1) N . These might be 
overestimated. In the real case, due to steric restrictions, the number of available conformal 
states and their connections might be greatly reduced. Nevertheless, from the results in 
this paper, we see that this phenomenological model indeed characterizes many features and 
provides several important insights into the protein-folding problem. 



APPENDIX 

In this appendix we give an explicit expression for the frequency-dependent diffusion 
parameter D(p, s), where s is the Laplace transform variable over time r: 



D(p,s) = 




The average ( )r is taken over P(R, p), the probability distribution function of the transition 
rate R from one state with order parameter p to its neighboring states, which may have order 
parameters equal to p — l/N, p or p+l/N, and A (p) = 1/V + ( 1 — 1 /V)p is the probability for 
a molecule to move to a state with p = p + l/Norp = p — l/N when it leaves a state with 
p = p Q . The probability distribution P(R, p) is one of the main ingredient for characterizing 
the dynamical behavior of this protein folding model. For example, if the energy landscape 
is smooth, the transition rates are distributed narrowly and can be described by a single 
scale. Then we have the usual Markovian diffusion. As an illustrated example, if one sets 
P(R,p) ~ 5(R-R a (p)) for some specific R a (p), then D(p, s) « (\(p)R a (p)) / (2N 2 ) , which is 
independent of s. Hence Eq. (9) just reduces to the normal Fokker-Planck equation, and the 
folding process can therefore be well described by a single time scale. On the other hand, if 
the distribution P(R,p) is broad over R, then the dynamic behavior can not be discussed 
by using a single time scale. This results a generalized Fokker-Planck equation with the 
diffusion kernel in time. In this case one would expect a broader distribution in the FPT 
distribution P fpt {t). 



19 



To derive the diffusion kernel, one first notice that the kinetic rate of jumping from one 
configuration to another is a stochastic variable 

R = R £ exp(-^-^) +R J2 1- 

J Ei<E 

where R is a function of a stochastic variable energy £\ Therefore by knowing the distribu- 
tion of E, one can derive the probability distribution of R as well. The explicit expression 
for P(R,p) has been derived in the literature 0], and we just quote the result here: 

1 



P(R,p) = < 



.V2(3AE(p), 



6(R - RoNu) 



(28) 



for R = R Nu, 



P(R,p) 



/3A(p) 



2 log 



RoNu 
R 



1/2 



for RqNu > R > R sep (p), and 
P(R,p)- 



R n Nu 



1 



4.io g| ^); 1/2 



(29) 



exp 



log 2 (R/R sep (p)) 



(30) 



'2irP 2 AE 2 (p)R? { 2(3 2 AE 2 (p) 

for R sep (p) > R > R s iow{p)- P(R, p) = for R < R s i oq and R > R Nv. Here (3 is the inverse 
temperature, R sep {p) = RqNv exp(-p 2 AE 2 /2), and R s i ow (p) = RoNv exp[(3 2 AE 2 (p)/2 - 
V2S*(3AE(p)], where 

1-Pl 



-plog(l-p) - (1-p) log- 



(31) 



which is the configuration entropy for Np native residues. The function ((x) is defined as 

C(x) = ^= Hdye-y 2 , (32) 

\ 71 Jx 



which is equal to the complement error function erfc(a;) divided by 2 for x > 0. 

From Eq. (p7|), one can make a series expansion of 1/D(p,s) with respect to s as in 
Eq. (17), which facilitates the calculation for the moments of the FPT distribution.. Here 
we just list the first several terms for a n (p): 
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«o(p) = J(pj Rl ^ 
IN 2 

ai( P ) = J^iMp) ~ R\{P)) 
27V 2 

a 2 ( P ) = ttt(^3(p) - 2R 2 (p)R 1 (p) + Rl(p)) (33) 

Hp) 

and etc., where R n (p) = (l/i?™)i?(p). In our calculations we carry out the average over rate 
distribution ( )r by numerical integrations. 
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FIGURES 

FIG. 1. D(p, 0) for various settings of Ae/T. At low temperature the diffusion timescale varies 
in orders of magnitude over p. 

FIG. 2. MFPT versus different scaled inverse temperatures ((a): Ae/T; (b): Se/T; {c):Tq/T) 
for various Se/Ae. For each fixed Se/Ae, the curve is like an V shape. The temperature at which 
MFPT is at its bottom is defined as the transition temperature To. As 5e/Ae increases, the 
minimum of MFPT decreases. 

FIG. 3. The minimum of MFPT (r) m j„ in a logarithmic scale versus Se/ Ae. There exists a 
monotonic relationship between (r) m j n and Se/Ae. As 5e/Ae increases, the minimum of MFPT 
monotonically decreases. 

FIG. 4. (r 2 ) versus reduced inverse temperature Ae/T for various 5e/Ae. For each fixed 
5e/Ae, the curve is like an V shape. The temperature at which MFPT is at its bottom is defined 
as the transition temperature To. As Se/ Ae increases, the minimum of (r 2 ) decreases. 

FIG. 5. (t 2 )/(t) 2 versus reduced inverse temperature Ae/T for various Se/Ae. At high tem- 
perature this value keeps finite and the folding process is self-averaging. As the temperature drops, 
the value starts to diverge and non-self- averaging behavior emerges. 

FIG. 6. An illustrative figure of the power law distribution as compared with a Gaussian normal 
distribution. Notice the prominent fatty tails of power law distribution. 

FIG. 7. The exponent a versus Ae/T for the case Se/Ae = 4.0. Below the transition tempera- 
ture To, the FPT distribution is close to a Levy distribution, and for large time it has a power-law 
tail : P fpt {t) ~ r-( 1+a ). We see that below To, a decreases when the temperature gets lower. 

FIG. 8. Dynamic phase diagram showing the transition between self- and non-self-averaging 
dynamics. 
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TABLES 



5e/Ae 3.2 3.6 4.0 4.4 

Ae/T 0.370 0.336 0.309 0.286 

Ae/T f 0.339 0.304 0.276 0.252 

Tf/T 1.091 1.105 1.120 1.135 

TABLE I. Comparison between the thermodynamic folding transition temperature Tj and the 

dynamic transition temperature Tq. 
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